Guideline: Guidelines On Technology Availability Management Techniques

Technology Availability Management Techniques

Component Failure Impact Analysis (CFIA)

During planning and designing for technology availability, it is necessary to predict and evaluate the impact on IT Service Availability arising from component failures within the proposed IT Infrastructure and service design.

Component Failure Impact Analysis (CFIA) is a relatively simple technique that can be used to provide this information. However, it is recommended that CFIA must be used in a much wider context to reflect the full scope of the technology infrastructure, i.e. hardware, network, software, application and Users. Additionally, this technique can be applied to identify impact and dependencies on IT support organization skills and competencies amongst staff supporting the new IT Service. This activity is often completed in conjunction with technology continuity management.

CFIA achieves this by providing and indicating:

Single points of failure that can impact technology availability
The impact of component failure on the business operation and users
Component and people dependencies
Component recovery timings
The need to identify and document recovery options
The need to identify and implement risk reduction measures.

Failure Tree Analysis (FTA)

This technique is used to determine the chain of events that causes a disruption to the IT services and the events can be combined using logic operators like AND, OR, Exclusive OR, etc.

Essentially FTA distinguishes the following events:

Basic events – Terminal points for the fault tree, e.g. power failure, operator error. Basic events are not investigated in greater depth. If basic events are investigated in further depth, they automatically become resulting events.
Resulting events - Intermediate nodes in the fault tree resulting from a combination of events. The top most point in the fault tree is usually a failure of the IT Service.
Conditional events - Events that occur only under certain conditions, e.g. failure of the air-conditioning equipment only affects the IT Service if equipment temperature exceeds the serviceable values.
Trigger events - Events that trigger other events, e.g. power failure detection equipment can trigger automatic shutdown of IT Services.

The main advantages of FTA include:

It can be used for technology availability calculations
The operations performed on the resulting fault tree correspond with design options
The desired level of detail in the analysis can be chosen.

Service Outage Analysis (SOA)

SOA is a technique designed to provide a structured approach to identify end-to-end technology availability improvement opportunities that deliver benefits to the user. Many of the activities involved in SOA are closely aligned with those of Problem Management.

The high-level objectives of SOA are:

To identify the underlying causes of service interruption to the user
To assess the effectiveness of the IT support organisation and key processes
To produce reports, detailing the major findings and recommendations
To initiate a programme to implement the agreed recommendations
Measure the technology availability improvements derived from SOA.

The key principles of the SOA approach are:

The underlying reasons for service interruption can be because of the shortfalls in technology, process, procedure or behaviours (culture)
Wider ranges of data sources are used to support the analysis
Business and user input is fundamental
A specifically mobilised cross-functional team undertakes the analysis
SOA assignments have a recognised sponsor(s) (Ideally joint sponsorship from the IT and business).
The reasons for adopting an SOA approach are:
Understanding the view of technology availability and issues from business and user perspective
Providing a structured, focused and detailed analysis of a selected IT Service or technology infrastructure components
Providing a mechanism to ensure that the IT Infrastructure delivers optimum technology availability.

The benefits from taking an SOA approach:

Enable requests for enhanced levels of technology availability to be met without major cost
Provides the business with visible commitment from the IT support organisation
Develop in-house skills and competencies to avoid expensive consultancy assignments related to Availability improvement
The cross-functional team approach is an enabler to provide innovative and often inexpensive solutions.

The high-level activity flow of SOA

Select Opportunity – Scope Assignment – Plan Assignment – Build Hypothesis – Analyse data – Interview Key Personnel – Findings and Conclusions – Recommendations – Report – Validation – Build Programme.

Technical Operation Post (TOP)

A Technical Observation Post is a technique where technology availability management facilitates in organizing a group of specialists to come together, on-location to monitor and observe real-time activity, focused on identifying the cause of certain trends. The TOP is best suited for delivering proactive business and User benefits from within the real-time IT environment.

The benefits of using a TOP as an approach to continuous improvements are that it:

Is an informal structure comfortable for technical staff and has limited management overhead
Is cost effective
creates an environment that can positively harness the technical capabilities of staff
creates a cross functional team that is focused and shares a common sense of purpose
creates an environment for the sharing of information to the benefit of all attending
enables IT support organisation staff to observe the operational environment
can identify areas of improvement masked by inefficient tools, processes and procedures.

Single Point Of Failure Analysis (SPOF)

A SPOF is any configuration item that can cause an incident when it fails, and for which a counter measure has not been implemented. A single point of failure may be a person or a step in a process or activity, as well as a component of the IT infrastructure. It is important that no unrecognized SPOFs exist within the IT infrastructure design or the actual technology, and that they are avoided wherever possible.